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Abstract. We consider the Random Walk Metropolis algorithm on R" with Gaussian pro- 
posals, and when the targ et probability measu re is the n-fold product of a one dimensional 
law. It is well-known (see [Roberts et all (|l997l )) that, in the limit n oo, starting at equi- 
librium and for an appropriate scaling of the variance and of the timescale as a function of 
the dimension n, a d iffusive limit is obtained for each component of the Markov chain. In 
IJourdain etal\ (|2012l ). we generalize this result when the initial distribution is not the target 
probability measure. The obtained diffusive limit is the solution to a stochastic differential 
equation nonlinear in the sense of McKean. In the present paper, we prove convergence to 
equilibrium for this equation. We discuss practical counterparts in order to optimize the 
variance of the proposal distribution to accelerate convergence to equilibrium. Our analysis 
confirms the interest of the con stant acceptan c e rate strategy (with acceptance rate between 
1/4 and 1/3) first suggested in [Roberts et al.l (IiDQTI ). 

We also address scaling of the Metropolis-Adjusted Langevin Algorithm. W hen starting at 
equilib rium, a diffusive limit for an optimal scaling of the variance is obtained in lRoberts and Rosenthal 
(|l998l ). In the transient case, we obtain formally that the optimal variance scales very dif- 
ferently in n depending on the sign of a moment of the distribution, which vanishes at 
equilibrium. This suggest that it is difficult to derive practical recommendations for MALA 
from such asymptotic results. 



1. Introduction 

Many Mark ov Chain Monte Carlo (MCMC) m e thods are based on the Metropohs-Hastings 
algorithm, see [Metropolis et al.l (|l953l l: iHastinml (|l97(t ). To set up the notation, let us recall 



this well-known sampling technique. Let us consider a target probability distribution on R" 
with density p. Starting from an initial random variable Xq, the Metropolis-Hastings algorithm 
generates iteratively a Markov chain {Xk)k>o in two steps. At time k, given X^, a candidate 
Yfc+i is sampled using a proposal distribution with density q{Xk,y). Then, the proposal Ifc+i 
is accepted with probability a(Xfc,Y'fc+i), where 

a{x,y) = 1 A r. 

p{x)q{x,y) 

Here and in the following, we use the standard notation a f\b = min(a,6). If the proposed 
value is accepted, then X^+i = Yfc+i otherwise X^+i = X^^. The Markov Chain {X^jj^y^ is by 
construction reversible with respect to the target density p, and thus admits p as an invariant 
distribution. The efficiency of this algorithm highly depends on the choice of the proposal 
distribution q. One common choice is a Gaussian proposal centered at point x S M" with 



variance a"^ Id^ 



xn- 



/ N 1 f \x-y\'^ 
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Since the proposal is symmetric {q{x,y) = q{y,x)), the acceptance probabihty reduces to 

(1) aix,y) = lA^ 

p{x) 

Metropohs-Hastings algorithms with symmetric kernels are called random walk Metropolis 
(RWM). Another popular choice yields the so called Metropolis adjusted Langevin algorithm 
(MALA). In this case, the proposal distribution is a Gaussian random variable with variance 
C7^1d„xn and centered at point x + ^Vln(p(x)) (in particular, it is not symmetric). It cor- 
responds to one step of a time-discretization with timestep a"^ of the (overdamped) Langevin 
dynamics : dXf = dBt + ^Vlnp(X() dt which is ergodic with respect to p{x) dx (here, Bt is a 
standard n-dimensional Brownian motion). 

In both cases (RWM and MALA), the variance cr^ remains to be chosen. It should be 
sufficiently large to ensure a good exploration of the state space, but not too large otherwise 
the rejection rate becomes typically very high since the proposed moves fall in low probability 
regions, in particular in high dimension. It is expected that the higher the dimension, the 
smaller the variance of the proposal should be. The first theoretical re s ults t o optimize the 
choice of cr^ in terms of the dimension n can be found in Roberts et al. The authors 



study the RWM algorithm under two fundamental (and somewhat restrictive) assumptions: 

(1) the target probability distribution is the n-fold tensor product of a one-dimensional density: 

n 

(2) p{x)=J{eM-V{x^)) 

i=l 

where x = {xi, . . . ,Xn) and / exp(— 1/) = 1, and (ii) the initial distribution is the target 
probability (what we refer to as the stationarity assumption in the following): 

~ p{x) dx. 

The superscript n in the Markov chain (XJ^)k>o explicitly indicates the dependency on the 
dimension n. Then, under additional regularity assumption on V, the authors prove that for 
a proper scaling of the variance as a function of the dimension, namely 



2 



n 



where I is a fixed constant, the Markov process (^X^J^^^ (where X^'^ G M denotes the first 



component of G R") converges in law to a diffusion process: 



(3) dXt = y/h{I,l)dBt - h{I, l)-V'iXt) dt, 
where 

(4) h{I,l) = 2/2$ (^-^^ and / = J (y')^exp(-y). 

Here and in the following, [-J denotes the integer part (for y G M, [y\ G Z and [y\ < y < 
[y\ + 1) and $ is the cumulative distribution function of the normal distribution ($(a;) = 
exp(— y2/2) dy). The scaling as a function of the dimension of the variance and of 
the time are indications on how to make the RWM algorithm efficient in high dimension. 
Moreover, a practical counterpart of this result is that / should be chosen such that h{I, I) is 
maximum (the optimal value of / is approximately ^^), in order to optimize the time scaling 
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in ([3]). This optimal value of / corresponds equivalently to an average acceptance rate 0.234: 
for this choice, in the limit n large. 



a{x, y)p{x)q{x, y) dx dy = 0.234. 



Thus, the practical way to choose is to scale it in such a way that the average accep 
tance rate is roughly 1/4. Similar results have been obtained for the MALA algorithm 
Roberts and Rosenthall (|l998h . In this case, the scaling for the variance is cj„ 



m 



the time scaling is {X^^^i^^^t>Q and the optimal average acceptance rate is 0.574. 



nl/3 ' 



There exis ts seve ral extensions of su c h results for various Metropolis-Hasting s algorithms , 



see 



Roberts a nd R osenthal (Il998l. [200lh: iBrever et al.l (|2004l l ; iNeal and RobertsI (|201ll . l2012l 'l ; 
Bedard et aO (|2012al lbh: iBeskos et al.l (|2012l ). and some of them relax in particular th e first 



main as sumption men t ioned above about th e separated form of the target distribution, see lBrever and Roberts 
(j2000l ): iBedardI (|2007l . l2009l ) : iBeskos et al.l (l2009l). E xtensions to infinit e dimensional s etting s 
have also been explored, see iMattindv et al.l (|2012l l: IPillai et all (j2012h : iBeskos et all (j2009l i. 

All these results assume stationarity: the initial measure is the target probability. To 
the best of the auth ors' knowledge, the only works which deal with a non-stationary case 
are 



Christensen et al. (2005) where t he RWM and the MALA algorithms are analyzed in the 



Gaussian case and lPillai et al.l (|201ll ). In the latter paper, the target measure is assumed to 
be absolutely continuous with respect to the law of an infinite dimensional Gaussian random 
field and this measure is approximated in a space of dimension n where the MCMC algorithm 
is performed. The authors consider a modified RWM algorithm (called preconditioned Crank- 
Nicolson walk) started at a deterministic initial condition and prove that when (Jn tends to 
as n tends to oo (with no restriction on the rate of convergence of an to 0), the rescaled 
algorithm converges to a stochastic partial differential equation, started at the same initial 
condition. 

The aim of this article is to discuss extensions of the previous results for the RWM and the 
MALA algorithms, without assuming stationarity. The main findings are the follow ing. 

Concerning the RWM algorithm, in the companion paper jJourdain et al. (l2012l ). we prove 
that, using the same scaling for the variance and the time as in the stationary case (namely 

a'i = — and considering {x}'^,,] ), one obtains in the limit n goes to infinity a diffusion 

process nonlinear in the sense of McKean (see Equation ([7| below). In particular, the scaling of 
the variance and of the number of samples as a function of the dimension is the same as in the 
stationary case. This is discussed in Section [2j Contrary to ([3]), this diffusion process cannot 



be obtained from the simple Langevin dynamics dXi = dBt 



V'iXt) 



dt by a deterministic 



time-change and its long-time behavior is not obvious. In Section we first prove that its 
unique stationary distribution is e~'^^^^dx. Assuming that this measure satisfies a logarithmic 
Sobolev inequality, we prove that the Kullback-Leibler divergence of the marginal distribution 
at time t with respect to e~'^^^^dx converges to at an exponential rate. In Section [5l we 
discuss optimizing strategies which take into account the transient phase. Roughly speaking, 
the usual strategy which consists in choosing / (recall that = — ) such that the average 
acceptance rate is constant (with value between 1/4 and 1/3) seems to be a very good strategy. 
This is numerically illustrated in Section [6l 

Concerning the MALA algorithm, the situation is much more complicated. The scaling to 
be used seems to depend on the sign of an average quantity (see Section [3. 1.3p . In particular, 
the scaling = ;^^t73 which has been identified in iRoberts and Rosenthal! ( 19981 ) under the 
stationary assumption is not adapted to the transient phase. It seems difficult to draw any 
practical recommendation from this analysis. This is explained with details in Section [3j 
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2. Scaling limit for the RWM algorithm 

In this section, we state the diffusion hmit for the RWM algorithm , and e xplain formally 
why this result holds. A rigorous proof can be found in iJourdain et al. 



2.1. The RWM algorithm and the convergence result. We consider a Random Walk 
Metropolis algorithm using Gaussian proposal with variance cr^ = — , and with target p defined 
by ([2]). The Markov chain generated using this algorithm writes: 

(5) xi:-, = Xl- + ±GU,U,^„l<^<n 

with 

where (G^)j,fc>i is a sequence of independent and identically distributed (i.i.d.) normal random 
variables independent from a sequence {Uk)k>i of i.i.d. random variables with uniform law 
on [0,1]. We assume that the initial positions (Xq'", . . . , Xq are exchangeable (namely 
the law of the vector is invariant under permutation of the indices) and independent from 
all the previous random variables. Exchangeability is preserved by the dynamics: for all 
A; > 1, (X^'", . . . , X^'") are exchangeable. We denote by J^J^ the sigma field generated by 

(Xo^'",...,Xo"'")and {Gj,...,Gf,Ui) KKk- 
For t > and i G {1, . . . , n}, let 



Yr = {\nt] - nt)Xl;Z^ + {nt - [ntDX^'Z^ 

be the linear interpolation of the Markov chain obtained by rescaling time (the characteristic 
time scale is 1/n). This is the classical diffusive time-scale for a random walk, since the 
variance of each move is of order 1/n. 

Let us define the notion of convergence (namely the propagation of chaos) that will be 
useful to study the convergence of the interacting particle system ((1^ l^"'")t>o)n>i in 

the limit n goes to infinity. 

Definition 1. Let E be a separable metric space. A sequence (x", . . . ,Xn)n>i of exchangeable 
E'^-valued random variables is said to be v-chaotic where u is a probability measure on E if 
for fixed j G N*, the law of (x", . . . , x?) converges in distribution to v'^^ as n goes to oo. 



According to ( Sznitmanl . 199ll . Proposition 2.2), the i^-chaoticity is equivalent to a law 



of large numbers result, namely the convergence in probability of the empirical measures 
= n Sr=i ^Xi ^° ^^^^ constant u when the space of probability measures on E is endowed 
with the weak convergence topology. 

W e are now in position to state the convergence result for the RWM algorithm, taken 
from Jourdain et al. ( 20121 ). 



Theorem 1. Let m be a probability measure on M such that {m, {V')'^) < +oo. Let us also 
assume that 

(6) V is a G^ function on M. with bounded second and third order derivatives. 

If the initial positions {Xq'"', . . . , Xq '")„>i are m-chaotic and such that lim„^oo lE[(T^'(A^o'"))^] = 
(m, (V)'^), then the processes {{Y^'"', . . . , y^"'"')t>o)ra>i c^'^e P-chaotic where P denotes the law 
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(on the space C(M+,M) of continuous function with values inM.) of the unique solution to the 
stochastic differential equation nonlinear in the sense of McKean 

(7) dXt = T^i\n{v'{Xt))\nv'\xt%i)dBt-Gmy'm)\w'\m.^^^^ 

where {Bt)t>i is a Brownian motion independent from the initial position Xq distributed ac- 
cording to m. The functions T and Q are respectively defined by: for I £ (0, +oo), a G [0, +oo] 
and 6 G M, 



(8) T{a,bJ) = if a = +oo, 
where 6"*" = max(6, 0), and 

(9) g{a,b,l)-- 



ifa = 0, 



re 2 $ / 



b 

2v^ 



^/a) ) ifae (0, +00), 



if a = +00, 



{b>0} 



,0 -LJL 

Ve 2 



z/a = 



Here and in the following, the bracket notation refers to the duality bracket for probability 
measures on M: for [i a probability measure and (fi a bounded measurable function, 



Notice that the assumption on (Xq'", . . . ,Xq'")„>i is for example satisfied when the random 
variables Xq'", • • • , '" are i.i.d. according to some probabil ity measure m on R. 

This convergence result generalizes the previous result by Roberts et al. ( 19971 ) where the 
same diffusive limit is obtained under the restrictive assumption that the vector of initial posi- 
tions (Xq'", . . . , Xq '") is distributed according to the target distribution p{x) dx. In this case, 
(Xj)t>o indeed solves the stochastic differential equation ©-([I]) with time-homogeneous coeffi- 
cients (here, we use the fact t hat r{I,I,l) = 2g(J,/ , /) = h{I, I) where / = J^{V'{x)fe-^^''Ux = 
< 00, see (jjourdain et all |2012| . Lemma 1)). Moreover, by taking V{x) = 

^ -|- ^ln(27r), this theorem also yields similar results as IChristensen et ah! ( 20051 ) . where 
the authors consider a non-stationary case, but restrict their analysis to the evolution of 
k 1-^ - ^iLi(X^'")^ for Gaussian targets. 

In addition to the previous convergence result, we are able to identify the limiting average 
acceptance rate. 

Proposition 1. Under the assumptions of TheoremUl the function 



t ^ E 



r{E[iV'{Xt))%E[V"{Xt)]) 



converges locally uniformly to and in particular, the average acceptance rate 1 1— )• P(^|^„tj_(_i) 
converges locally uniformly to t ^ acc(E[(y (X^))^], E[y'(Xt)], /) where for any a > and 
6 E M, 



(10) 



acc(a, 6, 1) 



r(a,5,o 

/2 • 
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2.2. A formal derivation of the limiting process Let us introduce the infinitesimal 
generator associated to ([7]): 

(11) L,^{x) = ^r((^, (V'f), V"),l)ip"{x) - g((/., {V')'), (m, V"),l)V'{x)^'ix). 



For a probability measure fi on M, {fi, V") is well defined by boundedness of V" (see ([6])), and 
(//, (1^')'^) is ^Iso '^^11 defined in [0, +oo]. 

The relationship between d?]) and (jlip is the following: if satisfies ([7]), then for any 
smooth test function if, (p{Xt) — Jq Lp^ip^Xg) ds is a martingale, where Pt denotes the law of 
Xf: for any s < t, 



(12) 



dr 



Actually, as explained in (jJourdain et al.l . l2012l . Section 3.1), the martingale representation of 
the solution is a weak formulation of ([7]) : solutions to (|12p are solutions in distribution to ([7]) . 



Let us now present formally how ([7]) is derived. First, let us exp l ain h ow the scaling of o". 



as a function of n is chosen. The idea (see 



Roberts and Rosenthal is to choose On in 

such a way that the limiting acceptance rate (when n — oo) is neither zero nor one. In the 



first case, this would mean that the variance of the proposal is too large, so that all proposed 
moves are rejected. In the second case, the variance of the proposal is too small, and the rate 
of convergence to equilibrium is thus not optimal. In particular, it is easy to check that On 
should go to zero as n goes to infinity. Now, notice that the limiting acceptance rate is: 

= E (^e-S"-(^'(^r)-G^^i+^"(^r)#) A 1 + 0(nal) 
(13) =expf^IV^V(:T^-V^)+'^> 



, +0(na3). 



where = aiyi^^iV {X'^f' X? and K. = 0-^" j F^^fxi'"). The formula (O is obtained 
by explicit computations (see (jRoberts et al.l . Il997l . Proposition 2.4)). From this expression, 
assuming a propagation of chaos (law of large numbers) on the random variables (A^'")i<j<n,, 
one can check that that the correct scal ing for the variance i s cr^ = — in order to obtain a 
non-trivial limiting acceptance rate (see (jJourdain et al.l . |2012| . Section 2.3)). 
Using this scaling, we observe that, for a test function (/? : R — )• R, 



= e(^ + 

(14) + {Xl'-) E ((G^+i)^U,,J-Fr) + 0{n-'/'). 
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We compute (by conditioning with respect to G^^^): 



E Gi^.e 



(15) 
where 



U=iiv'{xr)^Gi^,+v"ixl:")i^) 



-v'{xl'-)j^g ((z.,", {vr),{^L V"),l) + 0{n~'), 



-1\ 



1 " 

i=l 



denotes the empirical distribution associated to the interacting particle system. Th e equa- 
tion ([15]) is again a consequence of explicit computations (see ( Jourdain et al. . 20121 . Equa- 
tion (A. 3))), and the fact that the remainder is of order requires a detailed analysis 
(see (jJourdain et al.l . |2012| . Proposition 7)). Likewise, for the diffusion term, we get 

mGU?lA,^^H) = E ((Gi+i)2e^"-(^(^^'")-^(^^'"+7^^^+^» A 1 1^,") 



(16) 



To obtain (|16p . we again used an explicit computation (see (jJourdain et al.l . 120121 . Equa- 
tion (A.5))). 

By plugging (jl5p and (fT6|) into (|14p . we see that the correct scaling in time is to consider 



Yf- ' such that = (diffusive timescale), and we get: 

+ {Ylfr) r (v'f), v"),i) + o(n-3/2) 

where is defined by (jlip . This can be seen as a discrete-in-time version (over a timestep 
of size 1/n) of the martingale property (|12p . Thus, by sending n to infinity, assuming that vj^ 
converges to the l aw of Y^/^ , we expect Y^'"' to converge to a solution to ([7]). For a rigorous 
proof, we refer to I Jourdain et al. I (l2012h . 

3. Scaling limits for the MALA algorithm 

The aim of this section is to derive a diffusive limit for the MALA algorithm, following the 
same reasoning as for the RWM algorithm. 

The Markov chain generated by the MALA algorithm writes: 

(17) 



= + where = - f^^X^), 1 < ^ < n, 

where Ak+i = {f/fc+i < e^-i(^(^^'")-^(^r+^^':i)+M(GUi)'-(«l+i-^(^'(^r)+^'(^r 
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is the accepting event. Here again, (G^)j^fc>i is a sequence of i.i.d. normal random vari- 
ables independent from a sequence {Uk)k>i of i.i.d. random variables uniform on [0,1]. In 
Section 13. 1| we derive formally a limiting diffusion process. It appears that the scaling to 

be used depend on the sign of E (^{{V'fV" + F^^) - 2V^^W - {V"f) (X^'")) . This is more 

rigorously discussed in Section 13.21 for a Gaussian target probability measure. 

3.1. A formal derivation of the limiting process. 

3.1.1. Asymptotic analysis and limiting process. We adapt the same strategy as for the RWM 
algorithm, in Section [2.21 Let us first discuss how to choose the proper scaling for (T„. Using 
a Taylor expansion, one obtains: 

2" 



1 



+ C7l 



12 ' 4 



fc+i 



+ 



24 



(18) 

+ Oial] 



Setting, as above, = ;^ <^x''" ' ^''^^ 





^"g-E ({u^, 5(y(3))2 _ 61/ + ?,{V'V"f) 



so that one expects that X^iLi 



12 



A l-J 



fc + 1 



larly that Yll=i 



24 



((G'Ui)'-3) 



2V-(3)V-'(X^-")+(V"(X^'"))^ 



k+l) 



0{y/n) and simi- 



0{^). 



If this holds and lim„_!.oo 0"„ = 0, then 



(Cfc+i)^ 



GUi-Y(^'(^r)+^'(^r+4':i)) 



^(z^^, (y')^'^" + y'-^'^ - 2V^^W - {V" f) + 0{^al) + 0{nal). 



From this, 

(19) E (U,^, \F^) = ,^K,((v')^v"+yW-2y(3)v"-(y")^)> ^ ^ + o{V^al) + 0{nal). 

Here, we have assumed that {v^, {V'fV" + F^"^) - 2^(3) _ (yii^) ^ o. From this formula, 
we get the correct scaling for the variance, in order to obtain a non-trivial limiting acceptance 
rate (in accordance with ( Christensen et al. . 20051 . Section 5)): 



OPTIMAL SCALING FOR THE TRANSIENT PHASE OF MH ALGORITHMS 9 
Now, following the same reasoning as in Section [2.21 we have: for a test function 93 : M — )• M, 



Using the Lipschitz continuity of y 1— t- A 1, one may remove the contribution of the i- 
th coordinate in the acceptance ratio and then introduce it again after using conditional 
independence to check that 

E(GUiU+J-^fc) = E(Gl+i)E(U,^JJ-,") + = 0{n-'/% 

Prom this, one obtains 



+ 0(n-3/4). 



The correct scaling in time is thus to consider a piecewise linear process Y^'^ such that Y^j^ = 

X,^'" (this is again the standard diffusive timescale), and the expected propagation of chaos 
limit is solution to the nonlinear stochastic differential equation: 

It./. 



dXt = yJw{t,l)dBt - wit, l)-V'{Xt) dt 

(20) / 4 

where w{t,l) = f / eV^(({^')V"+yW-2V^(3)y'-(v^")^){X0) ^ 1 

Under appropriate assumptions on the potential V , we believe that a rigoro us proof of this 



result could be done using similar techniques as for the RWM algorithm in iJourdain et al. 
(|2ni2h . 



3.1.2. Relation to previous results in the literature. These results are related to previous ones 

xf. I 1 
2 2 



in the literature. First, in the Gaussian case V{x) = \ + \ lii(27r), one obtains from (|20|) that 



(E(X|))t>0 solves the ordinary differential equation 

j^{Xh = (eT(^(^*')-i) A 1) (1 - E(X2)). 

We recover here a result from ( Christensen et al. . 20051 . Theorem 2), where it is shown that the 
process 'Ylll=i{Xy^^ , in the Hmit n ^ 00 satisfies this ordinary differential equation. 
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Second, in the stationary case, namely when (Xq'", . . . ,Xq'^) are distributed according to 
the target density p defined by Q, the equalities 



imply that {v^, {{V')'^V" + y(^) - 2 V(^')V' - (V")'^)) = aii d this changes the scahng of the 
limiting acceptance rate in (fT9|) . In Roberts and Rosenthal ( 19981 ). it is shown that in this 



case, the correct scaling is = ^yya ^^id then (X^^^/a^j )t>o converges in distribution to the 
solution {Xt)t>o of the stochastic differential equation 



dXt = ./W)dBt - z{l)-V\Xt) dt, 



(21) 

where z{l) = 
where dm = e~^^^^ dx. 



/V("^,5(y(3))2 -3(F")^>/3 



3.1.3. Practical counterparts. The practical counterparts of the convergence results discussed 
above are the following. We can actually distinguish between three regimes: 

• On time intervals such that E (^{{V'fV" + V'^^^ - 2V'^^W - (Vf) (X^'")) < 0, then 



the correct scaling to obtain a diffusive limit is o"^ = -^W and there exists an optimal 



value of I to speed up the time scale of the dynamics of Xf, by maximizing w{t, I) (see 
Equation ([20])). 

On time intervals such that E (^{{V')'^V" + V'^'^^ - 2V'^^^V' - (V")'^) (X^'")) = 0, then 

the correct scaling to obtain a diffusive limit is cr^ = and again, there exists on 

optimal value of I t o speed up the convergence to equilibrium, by maximizing z{l) (see 
Equation dH]) and [Roberts and Rosenthall (|l998[ )). 



• On time intervals such that E (^{{V')'^V" + V^"^^ - 2V^^^V' - {V"f) (X^'")) > 0, with 

the scaling cr^ = ^^172) we observe that w{t,l) = P in ()20p so that one should take / 
as large as possible. This is an indication of the fact that the correct scaling for o"^ in 
this case should be such that ^ I'^deed, in the Gaussian case. Proposition |3] 

below shows that one should take cr„ going to as slowly as possible. 

In conclusion, in the MALA case (and contrary to the RWM case), the correct scaling as a 
function of the dimension is not the same at equilibrium and in the transient phase. Moreover, 
in the transient phase, the scaling depends on the sign of 

E (((y')V" + y(^) - 2y(3)y' _ ^v"f^ (x^'";" 

It seems thus difficult to draw any general simple recommendation for practitioners from this 
analysis. It is likely that the assumption that the target probability is the product of n 
one-dimensional laws is too restrictive to understand correctly the scaling n — )• 00 in this case. 

3.2. Rigorous results in the Gaussian case and when E((X^'")2) > 1. In this section, 
we consider the case of a Gaussian target, namely 

(22) y(^) = ^ + iln(2vr). 
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We thus have 

{y'fV" + y(^) - 2y(3)y' _ (y"f =x^-l. 
The aim of this section is to study in details the situation when 



> 1. 



Proposition 2. Let us consider (X^,'"") solution to (I17p /or i/ie Gaussian target (12211 . mi/i a 
variance independent of n: 

an = le (0,2). 

Lei m be a probability measure on R suc/i i/iai (m, x^) > . W^e endow the space M.^ with 

the product topology. If the initial random variables (Xq'", . . . , Xq are exchangeable and 
m-chaotic, then the processes (X^'", . . . are P-chaotic where P denotes the law of the 

Markov chain 

(23) n+i = f 1 - Vfc + 



i/ie sequence {Gk) i-i.d. according to the Normal law and independent from the initial 
position Yq distributed according to m. 

A simple case for which the assumption on the initial condition is satisfied is i.i.d. initial 
conditions {Xq'^)^ with law m. 

Notice that converges in law to J\f ^0, j^rj^ as — t- +oo. The asymptotic distribution 

converges to the target density when / — ?■ 0. Of course, for fixed n and i G {1, . . . ,n}, X^'" 
converges in law to M (0, 1) as A: — )• +oo. So the limits A: — >• oo and n — )• oo do not interchange, 
meaning that, for large n, the rate of convergence in distribution of (X^'")fc>i to M (0,1) 
should deteriorate. 

of Propositionm Let (F^'", . . . ,y"'") with F^j'" = X*'" and yI'^-^ = (l - f ) + ^^{^1 
denote the processes obtained when all moves are accepted in the MALA algorithm ()17p . The 
proof is divided into two steps. We are first going to prove that the processes (y^'", . . . , y"-'"-) 
are P-chaotic (this would be trivial if the initial conditions (Xq"')^ were supposed to be i.i.d.). 
Then, setting 

we win check that G N*, lim„_^oo F (nf=i -^fc) = 1- Since, on the event f]^^^Al, 

(X^'", . . . , X^'"')o<fc<ii' = (^fc • • • ) Y^'"')o<k<K one obtains the P-chaoticity of the processes 
(X"^'", . . . ,X"''"') by combining the two steps. 

For the first step, notice that for fixed j, K G W, the law of {{Yj^'^, . . . , Y^'^))o<_k<K is 

K-l 

m]{dyl, . . .,dyl) JJ (^Q{yl,dyl^i) x . . . x Q{yi,dyl^^) 



k=0 

2 



where Q{y, dy') = } e dy' and the law of (Xq'", . . . , Xq'") converges weakly 



to m®-' as n — )■ oo (since the initial conditions (Xq'", . . . ,Xq'") are m-chaotic). Since y i— )■ 



Q{y, dy') G 'P(M) is weakly continuous, this law converges weakly to 0^=1 y^idyo) nA:=o QiVk' ^yk+ 
which is the j'-fold product of the image of P by the canonical restriction to the K + 1 first 
coordinates. Hence the processes (y^'", . . . , y"'") are P-chaotic. 
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For the second step, let us introduce 



on 



/4 



1=1 



k+iJ ^ 2 



fc+iJ 



k+l 



One has 

(24) ip({^^+i}^) < ns"^ < 0). 

Some tedious but simple computations yields (using V(x) = 



+ iln(27r)) 



viYn 



I' 



1 



(g; 



k+lJ 



(25) = ^((^r)' 

so that (in law) 



I 



iV'{Y^'-) + V'{Y^'-,))f 



S2 



n 



1 " 



withG^,,! 



1 



a Normal random variable independent 



from //^ = ^ X^ILi '^y'-" ■ exchangeability of the initial condition (Y^ Y^*^'") is 

preserved by the evolution, the propagation of chaos result obtained in the first step implies 
(and is actually equivalent to) t he convergence in probability of the empirical measures /i" = 
n X]r=i <^y*." ^ P(M^) to P (see ( Sznitmanl . ll99ll . Proposition 2.2)). In particular, /i^ converges 
in probability to the law of 1^, solution to (|23p . 

With this law of large numbers, we see that in order to estimate P(5'^ < 0) we need to 

understand the evolution of (Pfe, y^) = E((yfc)2) with fc. One has (Pfc+i,y^) = (1- Y)^(-Pfc, 2/^) + 



P, and since {PQ,y ) = {m,y ) > 



one easily checks by induction that for all /c G N, 



{Pk,y') > jzjTjz- 
{Pk,y^AM) > ^if^. One has 



Hence for fixed A; € N, there exists M < +oo and e > such that 



P(5fc < 0) < 




>l + e +P 1 + 



I -2/1 



^k+l 




i=l 



l + 2e / 



1 - 

1 + 2e " 



< 



1 + g 
1 - /2/4 



1-/2/4 



1-/2/4' 1-/2/4^^+1 ^ 



V(l + 2e)(l-/2/4)y 

The first term of the right-hand-side converges to as n — )• -|-oo, since, by the strong law 
of large numbers, ^ X^"=i(G^_,_^)^ converges a.s. to 1. The second term converges to since 
(^^,y2 /\ M) converges in probability to {Pk^y^ A M) > yz^j^- The third term is bounded 



from above by $ 

n — > cx) and with ([2 
tends to 1. 



n{l-P/4) 
(l+2e) 



and also converges to 0. Hence P(5'^ < 0) tends to as 



one deduces that for fixed K £W, 



1 -4^) > 1 - Ek=i n{Air) 





As is clear from the previous proposition, for a fixed variance (T„ = / and if E((Xq'")2) > 1, 
then, for sufficiently small / (namely / < 2a/ 1 — 1/E((Xq'")2)) and in the limit n — ?• oo, (i) the 
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components do not interact and evolve independently according to the explicit Euler 

discretization (|23p (with a timestep of the Langevin dynamics dYt = dBt — Yt/2dt and (ii) 
the system remains in the region E(X|) > 1 for all k >0. 

Based on the previous result, it is natural to look for a diffusive limit for a C7„ which goes 
to zero at an arbitrary rate with respect to n. 

Proposition 3. Let us consider (X^,'") solution to ()17p for the Gaussian target (|22ll . with a 
variance an satisfying: 

lim cjr, = and lim na?. = +oo. 



Let m be a probability measure on M such that {m, x^) > 1 and (m, x^) < +oo. If the initial ran- 
dom variables {Xq'"' , . . . ,Xq'") are i.i.d. according to m, then the processes ((X|^^'"^2 j )t>0; • • • ; (^|l'/^2 j )t>o) 
are Q-chaotic where Q denotes the law of the Ornstein-Uhlenbeck process 

(26) dYt = dBt-Y dt 

with the initial position Yq distributed according to m and independent from the Brownian 
motion {Bt)t>Q- Moreover, the limiting mean acceptance rate is 1. 

Remark 1. For a more general potential V, if the initial random variables {Xq^, . . . ,Xq'^) 
are exchangeable and m-chaotic with (m, {V')'^V" + V^^'^ — 2V^^')V' — {V")"^) > 0, one expects 
the limit in law to be the one of the solution of Yt = Yq + -B* — Jg ^^-^^-^ds. But, unlike in 
the Gaussian case, it is not clear that E[{{V')'^V" + V^'^^ - 2^(3) _ (y//)2|(y^)] > q 
t > 0. Therefore, setting T = mf{t > : E[{{V')'^V" + F ^ - IV'-^^V - {V"f}{Yt)] = 0} with 
the convention inf = +oo and denoting by the law of (^)te[o,T)) one actually expects the 

processes ((-'^l;/^2 j )te[o,T), • • • , (-'^"t/" 2 j )te[o,T)) to be Q'^ -chaotic. 

Proof. As in the proof of Proposition [2l let (y-^'", . . . , y"'") with Y^^'^ = Xq^ and Y^'^-^ = 
1 — y^'" + CTnG^^^ denote the processes obtained when all moves are accepted in the 

MALA algorithm ()17p . The processes (y^^'J^^2 j , • • • , ^\tj^2 j ^ are independent and identically 
distributed and their common distribution converges weakly to Q by the strong convergence 
analysis of the Euler scheme applied to (I26p . Hence, to conclude the proof, it is enough to check 

that for fixed T > 0, lim„_j>ooIP (^ni='i^"^ •^k^ — where, as in the proof of Proposition [2l 
To do so, we use an upper-bound sharper than (j24p . Let us introduce (using ([25|) ): 

n 

nan ^ " 
where the random variables 



l<i<n 



14 BENJAMIN JOURDAIN, TONY LELIEVRE, BLAZEJ MIASOJEDOW 

are independent and identically distributed. Then, we have: 



We need to estimate the moments of the random variables -R^!". To do so, we assume from 
now on that n is large enough so that an < \/2 and we first estimate the moments of 

(2 \ fe ^ / 2 \ 

^r+-nE (i-f) G). 



One has, using the fact that c7„ Ylj=i " 4)'' G'j ~ M (o^ ^'^IZ^yT" ) > 

+ 420 1 - ^ J 'V, ^ {m,x^) + 105 ' ^ ^ 



< (m, x^) + 56(m, o;^) + 840(m, x^) + 3360(m, x^) + 1680. 



Therefore sup^.^^^^supj.>oE((i?^'")^) < +00. Moreover, for n large enough so that < 
2{{m,x^}-l) / J ]^ cr^ < 2), 

1 _ ^) = 1 + f 1 - 4 V V f 1 - 4 V^, _ 1 + - 1 



4yvv«// ^ 2yVV J J ~ 2 

where the latter inequality holds for k < [T/a'^l . From now on, we suppose that n is large 
enough so that < and we fix fc < [T/al\. Setting ct = (e-(^'^')T) one 

has 
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Therefore (using in particular the fact that E(i?^'") > 0), 



\i=i ' / \i=i / 



< + 



(nE(i?^'"))3 

3n(n - l)Var2((i?i'")) + nE ((i?^'" - E(i?^'"))4 



(nE(i?^''^))3 

(28) < (3n^ + 13n)E((i;r)^) < ^ 

where Ct is some constant not depending on n and k. With (|27p . we deduce that 

(29) p U i-^^+ir < E ^(^'^^+i}j^4^- 

\ fc=0 / fc=0 " 

Since hm„_j.oo '^c'n = +00, we conclude that linin^oo ^ ^ni='i^"^ •^tj ~ ^- ^ 

Remark 2. In the case when liuin^oo na'^ = and the initial conditions (Xq Xq'"') 
are i.i.d. according to m such that {m,x^) < +00, then, whatever the sign of (m,x^) — 1, 
the processes ((X.!"' 2 \)t>o, ■ ■ ■ , i^u? 2 \)t>o) 0,1"^ Q-chaotic where Q denotes the law of the 

Ornstein- Uhlenbeck process Yf = Yq + Bt — ^ds with the initial position Yq distributed 
according to m and independent from the Brownian motion {Bt)t>o- 

Indeed, for n large enough so that an < one may check that sup,^.>o E((Y^*''^)^) < C and 
replace ()28p by the estimation 



n 



so that 

k=0 J 

which converges to zero when n goes to infinity. 

4. Longtime convergence for the RWM nonlinear dynamics 

We would like to study the limiting dynamics ([71 obtained for the RWM algorithm, that 
we recall for convenience 

dXi = ri/2(E[(y'(Xi))2],E[y"(xo],/)di3t-g(E[(r(Xi))2],E[y"(xo],/)r(xodt 
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where T and G are respectively defined by ([8]) and ([9]). The associated Fokker-Planck equation 
is 

(30) I r 

where a[V't] = / {V')'^ipt and = / V'lpf 

Let us denote ip^o = exp{—V). Notice that a['i/'oo] = &[V'oo] and Q{a,a,l) = T(a,a,l)/2. We 
thus expect ip^o to be the longtime limit of 

4.1. Stationary solution. We start the analysis of the limiting process by checking that the 
solution of ([7]) has the expected stationary distribution. 

Proposition 4. There exists a unique stationary distribution fj, for the process Xf defined by 
([7]). In addition, this distribution is absolutely continuous with respect to the Lebesgue measure, 
with density Tpooix) = exp(— 

Before proving Proposition [H we need some preliminary facts. 

Lemma 1. Let us define the functions, for xGM, a>0, 6gM and I > 0, 

(31) /(x) = exp(xV2)<J>(2;), 

^2' 



(32) h{x) = xf{x) = xexp ^^^^(x), 



br(a,b,l)-2ag(a,b,l) if ^ ^ b 

(33) = I ^),(_^y_^^^^(_^j^ 

One can check that f and h are increasing functions, 

(34) yi > 0, V(a, 6) G M+ X M, F{a, b,l) > 
and 

(35) sign(r(a, b, I) - 2Q{a, b,l)) = sign(a - b) 
where the function sign is defined by: 

' 1 i/x > 0, 

(36) sign(x) = <^ x = 0, 

-1 if X < 0. 

Proof. Let us recall the well-known estimation 



The derivative of / is 



fix) = + X exp (^y ^ <^{x) 



For X > 0, /'(x) > 0. For x < 0, using the upper-bound in (j37p . we also obtain f'{x) > 0. 
Therefore, the function / is increasing. 

Since h'{x) = (l + x^) exp ^^^$(x) -|- it is obvious that h'{x) > for x > 0. For x < 

this comes for the lower-bound in ([37|) . 
By definitions of F and Q, we get 

, , T{a,b,l)-2g{a,b,l) J -lb\ (f. ^,\^flb , ^' 
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Using the identity 



exp ( -(a - b) 



exp 



I' 



2 \2y/^ 



exp 



-Pb^ 



the right-hand-side of (f38|) can be rewritten in terms of / (defined by ([3T 



r(a, b,l) -2g{a,b,l) 



exp 



-Pb^ 
8a 



f 



-lb\ 



/ 



l{b-2a) \ 
2./E ) 



Now it is clear that 

sign(r(a, b, I) — 2Q{a, b, I)) = sign 



/ 



2,/E J 

Recah that the function / is increasing and thus sign(r(a, b, I) — 2Q{a, b, I)) = sign(a — b). 
Similarly, 



Fia,b,l) 



21^/aexp 
2/2 exp 



h' 



2 7 



if a 



and (j34p is a consequence of the positivity of h' . 
We are now in position to prove Proposition HI 



<> 



of Proposition^ Let c = fp(V' (x) )^iljcx)ix)dx. Sin c e is bounded then one can check that 
c = J^V" {x)ipocix)dx < oo (see (jJourdain et all |2012| . Lemma 1)). By ([35]) we get that 
r(c, c, /) = 2Q{c, c, I). Let us define the Langevin diffusion 



dXt = ^/2g{c, c,l)dBt - Gic, c, l)V'{Xt)dt, 

with Xq distributed according to the density tpoo ■ It is well known that for any t >0 the density 
of Xt is ipoo and therefore c = E[{V'{Xtf] = E[V"{Xt)]. Then it is clear that the process 
Xt satisfies jT]). Hence ipcaix) dx is a stationary probability distribution for the stochastic 
differential equation ([7|. 

Let us now prove the uniqueness of the invariant measure. Assume that there exists another 
stationary probability measure with density p^o (the fact that the stationary measure admits a 
density is standard, since the diffusion term is bounded from below). Assume V'"^ Poo = +oo. 
Since Q{+oo,b,l) = and r(-|-oo,6, /) = the stochastic differential equation ([7]) with Xq 
distributed according to the density poo reduces in this case to dXt = -^dBt which does not 
admit a stationary distribution. Thus, necessarily, we have 



/ V^poo < oo. 



Let us denote a = J^V''^ Poo and b = J^V" Poo- Then, Equation ([7]) with Xq distributed 
according to the density p^o reduces to 

dXt = r^^^{a, b, l)dBt - g{a, b, l)V'{Xt)dt. 

The stationary distribution thus writes 

2g(a,6,0- 



Poo oc exp 



V 



T{a,b,l) 

By integration by parts, we obtain that 

br{a,b,l) = 2ag{a,b,l) 
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Hence, by (|34|) we obtain a = b and by 
exp(-y) = Voo- 



we get that 

° r(a,b,l) 



1. In conclusion, po 







4.2. Longtime convergence. It is actually possible to prove that, for fixed / > 0, the law 
of Xt solution to ([7]) converges exponentially fast to the equilibrium density ipoo- The proof 
is based on entropy estimates, using the Fokker-Planck equation (|30p . and requires the notion 
of logarithmic Sobolev inequality. 

Definition 2. The probability measure v satisfies a logarithmic Sobolev inequality with constant 
p > (in short LSI(p)) if and only if, for any probability measure p absolutely continuous with 
respect to v , 



(39) 

where 



In 



dp 
dv 



H{p\u) < ^I{pW) 
2p 

dp is the Kullback'Leibler divergence (or relative entropy) of p 

2 



with respect to v and l{^p\v') 
to V . 



Vln 



dp 
dv 



dp is the Fisher information of p with respect 



With a slight abuse of notation, we will denote in the following and the 

Kullback-Leibler divergence and the Fisher information associated with the continuous prob- 
ability distributio ns il;(x) dx and (j) (x) dx. We recall that, by the Csiszar-Kullback inequality 
(see for instance ( Ane et al. . 2000, Theoreme 8.2.7 p. 139)), for any probability densities iJj 
and 6, 



(40) 



\i;-<p\< v2H(im, 



so that H{'ilj\(j)) may be seen as a measure of the "distance" between ip and <f). 

Theorem 2. Let us assume ([6]), and that Xq admits a density ipQ such that E,[{V' (Xq))'^] < 
+00 and i7(^o|V'oo) < 00. Then, for all t > 0, 



(41) 



< 



where the function T is defined by: 



(42) 



J"(a, b, I) 



( r, g{a,b,l)-n^,b,l)/2 -i-L / „ 
^" ria,b,l){b-a) V " 7= 



4 



tfb 



In addition, the function 1 1— H (iptlipoo) is decreasing. 

Let us assume moreover that ipao = e~^ satisfies a LSI(p). Then there exists a positive and 
non-increasing function A : [0, +00) — )• (0, +00) such that Vt > 

(43) ^(VtlV'oo) < e-*^(^('^«l'^-))i/(V'o|V'oo). 

Equation ()43p shows that ijjt converges exponentially fast to '^/^oo ■ 

Remark 3. Roughly s peaking, tj)^ , satisfi es a LSI if V grows sufficiently fast at infinity. For 
example, according to liAne et al. . 200C . Theoreme 6.4-3), a sufficient condition for tpQo 

to 

satisfy a LSI, is that \V'\ does not vanish outside of some compact subset o/M and 



hm ^"^""^ 



and limsup 

\x\—>oo 



\V{x) + ln\V\x) 
{V'{x)f 



< +00. 



OPTIMAL SCALING FOR THE TRANSIENT PHASE OF MH ALGORITHMS 



In particular, in the Gaussian case V{x) 
LSI(l). 



\ + \ ln(27r), ipooix) = -i= exp , 2 
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satisfies 



To prove Theorem [21 we need some properties of the function T. 



Lemma 2. Let I > 0. The function {a,b) 1— )■ J^(a,b,l) is non-negative on x M and such 
that 



MM e (0, +00), sup T{a, b, I) < 1. 

(a,6)e[0,M]xR 



Proof. Recall (see Lemma[TJ that f{x) = e 2 ^(a;) is positive, increasing and that h(x) = xf{x) 
is also increasing. Setting for (a, G x M, ft, — CI^^^^jO — ^^^^ 

(x,y) eM2, 



otherwise, 



one has 



J^{a, b, I) 



if a = 0, 

■il){x{a,b,l), (^{a,b,l)) otherwise. 



The monotonicity and the positivity of / imply that tp is non- negative on {{x,y) : x + y < 0}. 
Since for (a, b) G M*^ x M, ;\;(a, 6, /) + C{0', b, I) = —l\fa < 0, one deduces that is non-negative. 

When X > y, since h is increasing, yf{y) < xf{x) which implies — (x + y){f{x) — f{y)) < 
(/(x) + f{y)){x — y) and therefore iplxjy) < 1. This inequality remains valid for y > a; by 
symmetry of ip and for y = x since xf'{x) + f{x) > 0. 

For {x,y) G M?, with x > and —l\fM < x + y < 0, (so that x — y > 0) one has 
< < ^ and < < 1 so that ^p{x, y) < With the symmetry of -0, one 

deduces that sup^^^yy_i^^^^y^Q^^^y^i^ 'ip{x,y) < ^. Since / is and positive, one easily 
checks that ijj is continuous on M.^. As < 1 and {{x, y) : —l\fM <x-|-?/<0, xVy< 
is compact, one obtains that sup^^ ^(2^, y) < 1- 

As J^(0, 6, /) = and for (a, 6) G M.\ x M, x(a, 6, /) + C,{a, 6, /) = —l^/a, one concludes that 

SUP(a,b)e[O,M]xR-^(a)&>0 < 1- 



We are now in position to prove Theorem [2j 
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of Theorem\^ By simple computation, we have (for notational convenience, we write a, 6 for 

/ V'tln(^t/Voo)= / a#tln^t+ / Vdt^t 

= / d^{G{aAl)V'^t+T{aAl)dx^t/2)\ni:t 

+ / Vd,{g{a,b,l)V'^t + T{a,b,l)d,^t/2) 
Jr 

= -g{a,b,l) [ V'd^i>t-{r{a,b,l)/2) [ {d^lntl^tfi't 

JR JR 
-g{a,b,l) [ {Vf^t-{na,b,l)/2) I V'd^iJt 



= g{a,b,l)b-{T{a,b,l)/2) / {^,lni;tf^l^t 

Jr 

(44) -g{a,b,l)a + T{a,b,l)b/2. 

On the other hand, we have 

(9,ln(V't/Voo))'^t = / {d^ln^t + V'f^t 



{d^ln^t?^t + 2 [ {d^lmPt)V'iPt+ [ (F')Vt 
JR JR 

{d^lni;tfi;t-2b + a. 



We thus obtain 

^ [ ^l^tHA/4^oo) = g{a,b,l)b - {r{a,b,l)/2) ' '-'^>- '^>- 
at . R 



(9^1n(V't/^oo))'V't + 26-a 

/R 

g{a,b,l)a + T{a,b,l)b/2 

(r(a,6,/)/2) / {d^ln{%bt/tl;oo)fil;t + {b-a){g{a,b,l)-r{a,b,l)/2) 



(r(a,6,/)/2) 
)-r(a.fe,0/ 

(b-a)r(a,fe,/)/2 



(5.1n(^,/^^)) ^, + l|,^.|(6-a) _ ^)r(a, 6, 0/2 
where the ratio ^%'!l'a)]^f}'h'iy2'^ non- negative by (|35p . To control this term, we remark that 

2 / f \ 2 



{a-bf=( I {V'f^t - [ V"A = ( [ V'iV'^t + d,i,t) 

\JR JR / \JR 

V'd^lnii^t/e-^)^) <a [ (5,. ln(V^t/V'oo)) Vt- 



Inserting the function defined in (|^2|) . we deduce that 

/ ^an(^./^oo) < _ r(a,M)(l--F(a,M)) f (5, 

which is (|4ip . Since by Lemma [21 J-'(a, b, I) is not greater than 1, we deduce that 

d 



dt 



V-iMVi/Voo) <0. 
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Let us now assume that tpao satisfies a logarithmic-Sobolev inequality (|39p with parameter 
p. We thus have, from (j^Tjl . 



(45) 



di 



Since |?>[V't]| ^ ||^"||oo) by ( Jourdain et all l2012l . Lemma 2, Equation (3.2)), the function 
1 1—)- r(a[i/^f], /) is bounded from below by a positive constant. Thus, to obtain exponential 
convergence, in view of Lemma[21 we need a (uniform-in-time) upper bound on Jjg(y)^^t, to 
get a (uniform-in-time) positive lower bound on 1 — J^(a['i/'t], ^[V't]) 0- This is the aim of the 
next paragraph. 

First, notice that by (Ijo urdain et al.'. '2012^. Lemma 1 and Lemma 3), /ig(^')^(^i +"000) < 
-|-oo. Now, according to f.Otto and Villani . 200Q . Theorem 1), since ipoo satisfies a LSI(/3), ^oo 
also satisfies the transport inequality: for any probability density on M, 



inf 



P 



{x - yf-f{dx,dy) < - / ip\D.{Lp/ipoo) = H{ip\%ljoo) 



where, in the definition of the quadratic Wasserstein distance W2, the infimum is taken over 
all coupling measures 7 on with marginals ip{x)dx and 'ipooiy)dy- Moreover, for a coupling 
measure 7 between the probability measures ipt{x)dx and il^ooiy)dy, we have, using Cauchy- 
Schwarz inequality. 



JR 



{V'{x) + V'{y))iV'{x) - V'{y)hidx, dy) 



< 



{x - yf-f{dx,dy) 



1/2 



By taking the infimum over all coupling measures between ipt{x)dx and ipoo{y)dy, using the 
above transport inequality and the monotonicity of the relative entropy with respect to t, we 
deduce that 



< 



w 



//||2 



1/2 



< I -11^ 
p 



//||2 



^(V'olV'o 



+ 2 / (F')Vo 



1/2 



J||y"||^iI(Vo|V'oo) andd= |||F"|l^/M(^')'V'oo,one concludes that | 4(r)'(V'r 



Setting c 
V'oo)! < so that 

Vt > 0, 



c + Vc^ + Ad 



By definition of a^tfjt] and Lemma [21 one deduces that t 1— )• (1 — T{a\ipt\,h['(l)t],l)) is bounded 
from below by a positive and non-increasing function of i/('(/'o|V'oo) = V'o 1ii(^o/V'oo)- More- 
over, recall that inft>o r(a[^/^j], 6[V't], > 0- Inserting these lower bounds into (j45p . we conclude 
that there exists a positive and non-increasing function A : M+ — t- such that 



which yields (gS]). 
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5. Optimization strategies for the RWM algorithm 

In this section, we discuss how to choose the constant / in the scahng o"^ = ^ in order to 
optimize the convergence to equihbrium, using the nonhnear diffusion hmit ([7]). 
As a prehminary remark, notice that we wih restrict the discussion to cases when 

(46) b[i;t] = E{V"{Xt)) > 0. 

Indeed, points where V" is negative correspond to neighborhood of local maxima of the po- 
tential V, which are visited with very low probability over large time intervals by the dy- 
namics d?]). Moreover, we observe from that if bltpt] < 0, then, since F and Q are 
non-negative functions and a[ipt] > 0, j^H{'^|Jt\'^^) < -£MMMV:M In V't)^^* so that, 
since limi_!.oo r(a, b, I) = +oo (when 6 < 0), / should be chosen as large as possible in order to 
leave the concave region. 

In the following, we thus assume (|46p . 

5.1. Maximization of the exponential rate of convergence. In view of the inequali- 
ties (j4ip and (|45p . it seems natural to try to choose / maximizing (for given values (a, 6) = 
(a[Vt],6[Vt])) 

bT{a,b,l) - 2ag{a,b,l) 

I !-)• Tia, b, — Fia, b, I)) = ; = Fia, b, I), 

b — a 

in order to maximize the exponential rate of convergence to zero of i^('(/'t iV'oo)- In view of (|34p . 
for a ^ b, this is equivalent to maximizing / i— t- \bT{a, b, I) — 2aQ{a, b,l)\. 

Remark 4. We notice that, for Xf solution to 

(47) 2^E{V{Xt)) = 6r(a, b, I) - 2aQ{a, b, I) 

with {a,b) = {K{V'{Xt)),K(y"{Xt))), so that this optimization procedure has a simple inter- 
pretation in terms of the evolution of the energy: it amounts to maximizing \-^'E{V{Xt))\, 
namely making the largest possible moves in terms of energy. This seems quite a reasonable 
objective. 

Remark 5. In the Gaussian case (namely when V{x) = % + ^ln(27r)j, and assuming that 
the initial condition is also Gaussian, the density remains Gaussian for all time. Let us 
denote m{t) = '&{Xt) its mean and s{t) = K{X^ ) its second order moment, which completely 
characterize the Gaussian law at time t. Simple computations, still valid for non Gaussian 
initial conditions, yield 

ds 

r(s, 1, 1) - 2sg{s, 1, 1) = F{s, 1, 0(1 - s), 



(48) 



dt 

— = -g{s,l,l)m, 



dt 

where the first equation corresponds to (|47p . since V'{x) = x and V"{x) = 1. We observe 
that the optimization procedure in this case amounts to maximizing | ^ | • This accelerates the 
convergence to the equilibrium value 1 of s. 



Let us denote 



Fi{s,l) = Fis,l,l) = { 



/2exp(^-f j if s = 

2/2((l + ?)$(-l)-^exp(-g)) .f. = l 

^ f- ^) + (1 - 2^) exp ('^^) '^(^-IV^)) if s G (0, 1) U (1, +oo) 
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1 - / 



I 1 1 1 1 < < 1 1 1 1 

2 4 6 8 10 12 14 16 18 20 

Figure 1. Solid line: the function s i— )• l*{s). Dashed line: the function: s i— t- x*^/s. 

the function to be maximized in the Gaussian case, see Remark [5] . We observe that (using 
the fact that 6 > 0), 

(49) F{a,b,l) = ^F,{^^,lVb) 

so that the general maximization problem on F can be reduced to the maximization problem 
on Fi. Notice that the function Fi is on M+ x M+. 

Lemma 3. For any s > 0, the function I i— t- Fi{s,l) admits a unique global maximum at a 
point 

(50) l*{s) = argmax;>QFi(s, Z). 

The proof of this Lemma is given in the appendix. From Lemma |3] and Equation ()49p . we 
deduce that, for (a, 6) G M+ x R*^, there exists a unique l*(a,b) such that 

(51) /*(a, 6) = argmax;>o-^('3^) 0) 
and that 

(52) = _L,.(^). 

In particular, l*{s) = l*{s, 1). Notice that these scaling results show that a constant / strategy 
is far from optimal in the transient case, since when a and b vary, the optimal value l*{a,b) 
also varies. 

We now consider three regimes: the near equilibrium case s — t- 1 (recall that at equilibrium, 
a = b and thus s = a/b = 1), and the two situations far from equilibrium s — t- and s — t- oo 
(see Figure[T]for an illustration). In the Gaussian case (see Remark[5]), s{t) = 'K{Xi) so that 
these three regimes are easy to understand in terms of second moment. 

Lemma 4. We have the following asymptotic behaviors for the function I* : 
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• s — )• l; The function I i— )• Fi{\,l) admits a unique maximum at point Z*(l) — 1.85. 
Moreover, 

(53) limr(s) = 

and thus l*{a,b) ^a/b->i 

• s — )• O.' The function I i— t- Fi(0, /) admits a unique maximum at point = V2. 
Moreover, 



(54) limr(s) = r(o) = ^2, 

V2 



and thus l*{a,b) ~a/6->o 

• s — )• oo.- Let us introduce ip^x) = X\/^e~~ — x'^^ (~f)- function ip admits a 

unique maximum at point x* ~ 1.22. Moreover, 

(55) lim ^=x* 

so that l*{a,b) ~a/fe->oo ^"f^- 

Proof. The first two statements for s = 1 and s = are simple consequences of Lemma [3] and 
the imphcit function theorem appUed to Fi{s,l) respectively at point and (0, /*(0)), 

using the fact that ^(1, /*(!)) / and ^(0,/*(0)) / (see Equations ^ and §8^ 
below) . 

I — ^2 

Let us now consider the case s — ?■ oo. One has ^'(x) = ^/^e~~s' — 2x^ (~f) that 

I — I — ^2 

■i/)'(0) = W| > and, by ([37]), ip'ix) ~ -wfe"^ < as x -)• +oo. Moreover, by the 
lower-bound in (j37p . 

(e^V''(a^))' = -2 Tl + e^^^ (-|) + -|= < for X > 0. 



Whence the existence and uniqueness of x*. 



For s > 1/2, by the upper-bound in i^, {2s- l)e^— $ [ 2^ ) - ^° 
s> l,Fi(s,/) < j^i^il/y/s)- For e > 0, one deduces that lims_5.oo sup/^^[^.*_£ Fi(s, /) < 
^'^Px(^[x*-£,x*+e] "^{x). On the other hand, 

limsupFi(s, ?*(s)) > lim Fi{s,x*\/s) = ^^{x*) > sup ip{x). 

Hence, for s large enough, Fi{s,l*{s)) > sup^^[^*_^^^i,j^^^ ip{x) and £ [x* — e,x* + e]. Since 
e > is arbitrary, this yields ()55p . <0> 



5.2. Comparison with the constant a verage acceptance rate strategy. Under the 
stationarity assumption, it is standard (see [Roberts et ah ( 19971 )) to associate to the optimal 
value of I ~ an average acceptance rate (see the introduction). Indeed, in this case, there 
is a one-to-one correspondance between I and the limiting acceptance rate 



acc(/, /, /) 



r(/,/,/) hii,i) 



2$ 
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More precisely, / ~ is equivalent to 



2 S8 ^ 

acc(/,/,/) ~ 2^> ( — ) ~ 0.23 



which does not depend on /. A natural strategy is thus to adjust the variance in such a way 
that the average acceptance rate is 23%. In this section, we discuss how to use an equivalent 
approach in the transient phase. Of course, the interest of the constant average acceptance 
rate strategy i s that it can be impl e mente d using the so-called adaptive scaling Metropolis 
algorithm (see Andrieu and RobertI ( 200ll ): lAtchade and Rosenthal at time k, the 

standard deviation a is chosen equal to exp{9k) where 6k is updated using the Robbins-Monroe 
procedure O^+i = 9^ + 7fc+i(afc — ot) where a^+i is the observed acceptance rate ([T|) at time 
k, a & (0, 1) is the target acceptance rate and 7^+1 is a step size. 

The first question is: for given values of a and b, does an acceptance rate a G (0, 1) 
corresponds in a one-to-one way to a value Z > ? The average acceptance rate is (see 
Proposition [1]) 

We recall that we only consider the case 6 > 0, see the discussion at the beginning of Section [SI 

ih \ 



(Actually, if 6 < 0, acc(a, /) > <I> y-i^j > 1/2 for aU a > and / > 0, so that it is not 

possible to solve acc(a, b,l) = a for any values of a, which is again an indication of the 
ill-posedness of the optimization procedure when 6 < 0.) 
Now, for b > 0, observe that 



acc(a, b,l) = H iVb 



where 



(56) H{s,l) = <^[ ^ +e^^$n 



Solving acc(a, b,l) = a amounts to solving H l^/b 



a. 



Lemma 5. Let s > be fixed. The function I 1— )• H{s,l) is decreasing. Moreover, for all 
a G (0, 1) there exists a unique solution to the equation H{s,l) = a. This solution is denoted 
Z"(s) in the following. 

Proof. Let us first prove that, for a given s > 0, / 1— t- H{s, I) is strictly decreasing. We compute 

The right-hand side is negative for s G (0,1]. For s > 1, we have, using the upper-bound in 

dH, 1 f 

^(^'^)^-V^(273iy-p(-^j<o- 

This shows that / 1— )■ H(s, I) is strictly decreasing. 

It is easy to see that H{s,0) = 1. Now, using again the upper-bound in (j37)) for s > 1/2, 
one has 

(57) His,l) <^{^)+ hs^m exp (^(^ - 1)) + l(^>V2} ,(,,^)^ exp (-^) 
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SO that lim/^oo H{s, I) = 0. By continuity and strict monotonicity of H, we then get that for 
any a G (0, 1) there exists a unique l°'{s) such that H{s,l'^{s)) = a. (} 

As a corollary of this Lemma, we get that for any a > 0, 6 > 0, q G (0, 1), there exists a 
unique I" {a, b) > such that 

acc(a, b, /"(a, b)) = a 

and that 

(58) [-(a,6) = -Lr(^ 

In particular, /"(s) = /"(s, 1). 

Let us now compare the strategy based on the maximization of the exponential rate of 
convergence, presented in Section [5. 11 with a strategy based on a constant average acceptance 
rate. By comparing ()52p and (|58p . we observe that the scalings of I* and P in terms of a 
and b are the same, which is already an indication of the fact that a constant acceptance rate 
strategy is very natural. 

Near equilibrium, namely in the limit a/b — )• 1, the two strategies are the same if a is chosen 
such that = ^'^(1) which corresponds to 

(59) a ~ 0.35. 

Notice that this value is not far (but different, since we take into account the transient phase 
around equilibrium) from the acceptance probability 0.23 obtained under the stationarity 
assumption. 

To study the two limits s — t- and s — t- oo, we need the following lemma. 

Lemma 6. We have the following asymptotic behaviors for the function I": 
• s — )• 0.' For any a € (0, 1), 



(60) lim r (s) = V-21n(a), 

s—>-0 



and thus l'^{a,b) ~a/fe-s>o ^ 
• s — >• CO.- For any a G (O, |), 

(61) lim ^ = -2$-i(a), 

s-s>oo s 

and thus l°'{a,b) ^a/b^oo ~2<I>~^(a)-^. 
Proof. Let us first consider the case s — )■ 0. Observe that for any given / > 0, it holds 

(62) lim^//(s,0=expf^y 



Let e G (0, — 21n(Q)). By the monotonicity property of I i— H{s, I) stated in Lemma [5l 
sup H{s, I) < H{s, ^/-2ln{a) + e) — >s^o « exp ( ] • 

In the same way, lim infs_»o i'^f;<^ „2ln(a)-£ — '^^xp (|). Therefore, for s close enough to 0, 
P(s) E [y^-21n(a) - e, y^-2 ln(~a) + e] and holds. 

Let us now consider the case s — t- oo. Observe that H(s,l) > *^*(^~2v^)' Hence, if 



liminfs_j.oo = then liminfs_>.oo ^(s, P(s)) > \- Therefore, there exists a constant 
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Ci > such that, for large enough s, l"{s) > Ci^/s. Using (f57|) for the upper-bound, we get 
that, for s large enough, 

$ M- < a < ^ V + . r exp ^ 

Therefore 

hm <i> ^ = a . 

S-5-00 Y 2a/s / 

The continuity of ^>~^ concludes the proof of (|61|) . ^ 

By comparing (|54p with (j60p . we observe that in the regime a/6 — >• 0, the two strategies are 
the same if a is chosen such that y^— 2 ln(a) = \/2 namely 

(63) a = ~ 0.37. 

Finally, by comparing (j55p with (j6ip . we observe that in the regime a/h — )• oo, the two 
strategies are the same if a is chosen such that — 2<I>~^(a) = x* namely 

(64) a ~ 0.27. 

In view of (|59p . (|63p and (|64p . the constant average acceptance rate strategy with target 
value between 1/4 and 1/3 seems to be a very good strategy, since it is almost equivalent to 
the optimal exponential rate strategy. 

6. Numerical experiments 
In this section we present numerical experiments to illustrate results from Section [5j 

6.1. On the choice of the target average acceptance rate. In this section, we would like 
to discuss the choice of the average acceptance rate a in the constant average acceptance rate 
strategy. As mentioned above, we identified three different values of a for the constant average 
acceptance rate to be equivalent to the optimization of the exponential rate of convergence, 
depending on the regimes: | — > 1 (a ~ 0.35) ; f — > (a ~ 0.37) ; f — > oo (q ~ 0.27). 

In practice, a value has to be chosen for a. On Figure 16.11 we plot as a function of a 
and h the relative loss in terms of exponential rate of convergence, for the constant average 
acceptance rate strategy compared to the optimization of the exponential rate of convergence: 
F(a,b,i {a b))-F{a b I {a,b)) three valucs of a mentioned above. 

F(a,b,l* {a,b)j 

The main output of these numerical experiments is that the choice a ~ 0.27 seems to be 
the most robust, namely the one which leads to an exponential rate of convergence the closest 
to the optimal one, over the largest range of variation of a and b. This confirms the interest 
of the constant acceptance rate strategy. 

6.2. Gaussian case. Let us first consider the Gaussian target V{x) = ^x'^ + ln(27r) (see 
Remark[5|), with a Gaussian initial condition Xq such that m(0) = ]E(Xo) and s(0) = E(Xq). 
At time t, the law of Xt solution to the limiting stochastic differential equation ([7]) is Gaussian 
with mean m{t) and second moment s(t), where m and s satisfies (j48p . The Kullback-Leibler 
divergence admits an analytical expression in terms of m and s: 

Hii^tlM = \ {s{t) - ln{s{t) - mitf) - l) , 
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b=0.1 



CD 
O 

CO 
O 

O 
O 




b=10 




Figure 2. na,hr{a,b))-F{a,b r{a,b)) f^j^^^ion of a for h = 1,0.1, 10 and a 
0.27 solid line, a ~ 0.35 dashed line, a = e^^ ~ 0.37 dotted line. 



and its derivative writes 



dt 



ds 
IE 



1,0(1 



2m 



dm 
dt 



F(s,l,/)(1 -s) + 2mg(s,l,/)' 
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In the Gaussian case, it is thus possible, for each time t (and thus for fixed values of m{t) and 
s{t)), to maximize | ^ii"(?/;i|^/;oo)| in ^- This yields the best strategy that we could think of and 
implement numerically, in terms of the speed of convergence of the Kullback-Leibler divergence 
to (We also considered choosing l(t) minimizing some integral criterion like /q^°° H{Tpt\ipoo)dt 
but had no clue about how to solve the associated Hamilton- Jacobi equation). In the following, 
let us denote 

F{s,l,l){l- s) + 2mg{s,l,iy 



/™ (m, s) = argmax 



F{s,l,l){l-s) 



In the numerical experiments, we thus compare four strategies: (i) the const ant I strategy, 
with I = 2.38 (which is the optimal value under stationarity assumption, see [Roberts et al. 



( 19971 )): (ii) the constant average acceptance rate strategy, using l°'{a,b) (for a ~ 0.27 and 



a = e ~ 0.37) ; (iii) the optimal exponential rate of convergence l*{a,b) : (iv) the opti- 
mal strategy for the convergence of the entropy l'^^^(m,s). Notice that in the Gaussian case, 
a = ¥,{Xf) = s{t) and b = 1, so that 1° and /* are actually functions of s only. Let us 
also mention that there are actually two ways to implement (ii): either using a numerical 
approximation for P(a, 1) (and an estima t or a of a), or using the adaptive scaling Metrop- 
olis algorithm lAndrieu and Robertl (1200 ih : lAtchade and Rosenthall (|200,^ 1 mentioned at the 



beginning of Section 15.21 

The dimension is fixed to n = 100. To assess the convergence, we observe, as a function of 
the so-called burn-in time to, the convergence to zero of the square biases: 

(65) (e {^,,T+to) - l) ' and (e {iZ,T+t,)) ' 

where 

A:=to+l 



and 



k=to + l 

The expectations in ()65p are approximated by empirical averages over 200 independent real- 
izations of {X^, . . . , Xj,^^)Q<k<to+T- The size of the time window is T = 1500. When needed, 
we estimate the values for s = a and m using empirical averages over the n = 100 components 
of the process. 

On Figure [31 we first consider the initial condition Xq = (0, ...,0). The first moment 
is thus already at equilibrium, and we only observe the convergence of the second moment. 
Clearly, the constant / strategy is the worst. Using I* yields a convergence which is almost 
the optimal one, obtained for I = l^^^. And the constant average rate strategies also lead to 
excellent results in terms of convergence compared to the optimal scenario, even though it is 
here implemented using an adaptive scaling Metropolis algorithm. 

On Figure m we perform similar experiments with the initial condition Xq = (10, . . . , 10). 
We observe the convergence of the first and second moment. It is clear that constant scaling 
is outperformed by all the other strategies. We notice also that the adaptive scaling Me- 
tropolis implementation leads to slightly slower convergences compared to an implementation 
using P(a, 1). This difference could certainly be reduced by optimizing the parameters in the 
adaptive scaling Metropolis algorithm. 

In conclusion, we observed that: (i) The constant / strategy is bad ; (ii) The constant 
average acceptance rate strategy (using /") leads to very close convergence curves compared 
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Figure 3. Square bias of 1^^^ rp_^_rp as a function of the burn-in-time to for various 
strategies. The initial condition is (0, . . . , 0), and the constant acceptance rate 
strategies are implemented using an adaptive scaling Metropolis algorithm. 

to the optimal exponential rate of convergence strategy (using /*) ; (iii) the optimal exponential 
rate of convergence strategy is as good as the most optimal strategy one could design in terms 
of entropy decay (using l^^^). 

6.3. Non-Gaussian case. Let us now consider a non-Gaussian target, and more precisely 
a double-well potential. In order to satisfies the assumptions of Theorem [H we consider the 
function V given up to a normalizing additive constant by: 



V{x) 




if \x\ < 1, 
otherwise. 




2(x - l)(x + if + 2{x - l)2(x + 1) if |x| < 1, 
8x — 8 sign(x) otherwise, 

2(x + if + 8(x - l)(x + 1) + 2(x - 1)2 if |x| < 1, 

otherwise. 

Of course, no analytical expression for the entropy is available in this context, and we thus 
concentrate on the three following strategies: (i) the constant I strategy ; (ii) / = l'^(a,b) 
and (iii) / = l*{a,b). For the constant / strategy, we use I = = 1.18 (where we recall, 
/ is defined by Q). When needed, a and b are approximated by the estimators over the 
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Figure 4. Square bias of I*^ T+to (^op) I^rp_^_f^ (bottom) as a function of 
the burn-in-time to for various strategies. The initial condition is (10, . . . , 10). 
The notations Z^'^^ — A and Z"'^'' — N refer to the two implementations of the 
constant average acceptance rate: the adaptive scaling Metropolis algorithm 
and the numerical approximation of /°(a, 1). 

n components a = -'}2'i=i^'i-^t)'^ ^^'^ ^ ~ ~'l21^=i^"i-^t)- The parameters n = 100 and 
T = 1500 are the same as in the Gaussian case. 

Let us first consider as an initial condition Xq = (10, . . . , 10). On Figure El we observe the 
convergence of the first moment to its equilibrium value (namely 0). Again, the constant I 
strategy appears to be very bad, and the other strategies perform approximately equally well. 

Finally, let us consider Xq distributed according to a Gaussian distribution with mean 1 and 
variance 0.143Id. The mean and the variance are chosen in such a way that a = K{V' (Xq)'^) = 
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Figure 5. Square bias of I'^^rp_^^^ for non-Gaussian target as a function of 
the burn-in-time to for various strategies. The initial condition is (10, . . . , 10). 
The notations Z*^'^^ — A and f'''^'^ — N refer to the two implementations of the 
constant average acceptance rate: the adaptive scaling Metropolis algorithm 
and the numerical approximation of /°(a, 6). 



5.24 and b = ¥j{V" {Xq)) = 5.24. On Figure [6l we observe the convergence of the first and 
second moments to their equilibrium values (namely and 0.96). For the constant acceptance 
rate strategy, we compare the results obtained with a = 0.35 and a = 0.27. Here, it is much 
more complicated to draw general conclusions from these plots. Basically, all strategies yield 
comparable results. One could wonder why performs poorly for the first moment. The 
reason is probably that its bias cannot be encoded into a and b which are integrals of even 
functions with respect to the current marginal distribution. 

In conclusion, we observed that the results obtained with the constant acceptance rate 
strategy (even when it is implemented using an adaptive scaling Metropolis algorithm) are 
very similar to those obtained with the optimal exponential rate of convergence strategy. 
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Figure 6. Square bias of I^^ T+to (top) and ^^^^^ (bottom) for non-Gaussian 
target as a function of the burn-in-time for various strategies and Gaussian 
initial condition. The constant acceptance rate strategies are implemented 
using an adaptive scaling Metropolis algorithm. 

Appendix A. Proof of Lemma [3] 

Recall first that the function (s, /) i— >• Fi{s, I) is C°° on ]R-|_ x M+. It is easily checked that for 
any s > 0, Fi{s, 0) = and lim;_^oo Fi{s, I) = 0. With and the continuity of / i-)- Fi{s, I), 
one deduces the existence of a point l*{s) > such that Fi(s, l*{s)) = maxi>Q Fi{s, I). 

When s = 0, Fi(0,l) = P exp ^— This function admits a unique maximum at point 

l*{0) = V2. For further use, we observe that 

(68) ^(0,r(0))/0. 
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In the case s = 1, we compute the derivatives 

f (M)^(4/ + 2/3)^(4 



dp 



1,1) = (4 + 6/2)$ 



/27r 
101 



exp 



As a consequence, at a critical point of / i— )• Fi(l,l), 



(69) 



-(!,/) = (/2-6)<I> 



2tt 

I 
2 



P 
8 
P 



We deduce that any local maximum belongs to (0, \/6] and any local minimum to [VG, +oo). 
Since there is a local minimum (resp. maximum) between two distinct local maxima (resp. 
minima), we conclude that / i— )■ Fi{l,l) admits a unique local maximum which is also a global 
maximum and belongs to (0, \/6] and no local minimum on (0, +oo). For further use, we 
observe that ^(1, ^6) / and thus (from (l69]l ) 

(70) ^(l,r(l))/0. 

Let us now consider the case s € (0, 1) U (l,oo). The partial derivative of -Fi with respect 
to I is: 



(71) 



dFi 



is,l) 



s)]Fi{s,l) + P 



2s 
— exp 
vr 



8s 



Of course, ^^{s,l*{s)) = 0. Then, at any critical point of / i— t- F{s,l), we have (using the fact 
that ^(s,0 = 0) ^is,l) = p{s,l) where 



p{s,l) 



P 



1 + s] Fi(s,/) -21] 



exp 



'8s 



2V^ 



so that ^g^(s,l) = p{s,l) with (using again ^^(s,/) = to eliminate Fi{s,l)), 



pis, I) = l- 



-P + sP + 6 
P -sP-2 



exp 



'8s 



+ 21' 



.P-sP 



P -sP-2 



21- 



,P-sP 



P -sP-2 



where 



Xis,l) = $ 



2Vi 



1 P - sP 



I P - sP 




exp 



P_ 

'8s 



A.l. The case s > 1. Let us assume s > 1. In this section, we will prove that the function 
I I— )• p{s,l) is negative on some interval (0,/) and positive on (/,oo), which is equivalent to 

P -sP -i 

show that I I— 7- x(s, is negative on (0, /) and positive on (/, oo), since the ratio —; -7: is 

P — sP — 2 

positive. This implies that / 1— )• F{s, I) has a unique global maximum at point l*{s). Indeed, if 
li{s) < are two points in argmaxi>o Fi{s, I), then, l2is) < I and we reach a contradiction 
by noticing that there is necessarily a local minimum of I 1-^ F{s, I) in the interval (^i (s), l2{s))- 
We note that 
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where 



P{s,y) = -^^-^y' + (1 - - ^) + - (2 + 14s(l - s))y + 24s 



1 — s/\ \1 — s / 1 — s 

We will show that y i->- P{s,y) is positive on some interval (0, T) and negative on (l,oo). This 
means that I i— )■ x(S)0 increasing on {0,1) and decreasing on {l,oo). Since lim;^oX('5iO — 
— oo, lim/_^oo x(S) /) = and / i— )■ x(s, I) is a function, this implies that / x('S) 
negative on some interval (0,1) and positive on {l,oo), which concludes the proof. 
Let us now study the polynomial y i->- P[s,y). Let us introduce 

Q{s,y) = y-4.{- + s]y + 



1-s r l-s 



The discriminant of y i->- Q(s, y) is 



Since s > 1, and thus s(l — s) < 0, then A(s) > 0. The polynomial y ^ Q{s, y) has two roots: 



and 



+ s) - ^ (s2(l - s)^ - 10s(l - s) + 1) , 



Then, Q{s,y) < if and only if y G (y_,y_|_). The roots of y P{s,y) are {y_,y_|_,yo} where 

2 



2/0 



l-s 

We notice that y_ < y+ and y+ > yo- We observe that 

y_<yo ^ 2f-^ + s')--^(s2(l-s)2-10s(l-s) + l)'/'<-^ 
\1 — s / |1 — sp ' 1 — s 

» < (s'(l - - 105(1 - s) + 1) 

Thus, since s > 1, we have 

y_ < yo < < y+, 

and y i— )• P(s, y) changes its sign at each of its roots {y_, y_|_, yo}. Since limj^^oo Pis, y) = —oo, 
we deduce that P{s,y) > for y G (0,y+) and P{s,y) < for y G (y+,oo). This concludes 
the proof in the case s > 1. 

A.2. The case s < 1. First, we observe that the maximum of I i-)- Fi{s,l) is necessarily in 
(0, lo) where 
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Indeed, if P > we have (using the fact that Fi > and the upper bound in the classical 
inequality ([37|) : 

^\s, I) =(^-1(1- s)) F,{s, I) + I' (-yi^e^p ( -f] + 



dl ' ' ' \l ' Vvr 8sJ \ 2^s 




< 0. 

This shows in particular that l*{s) G (0,/o)- In all what follows, we only study the function 
Fi{s,l) for 

(0,/o). 

We need to prove that I i— t- Fi{s,l) admits a unique global maximum on (0,/o)- A sufficient 
condition is that / i— )• p{s,l) is negative for / < Iq. 

Notice that the function / i— )• x{s,l) is {[0,Iq)), has the same sign as p{s,l) and that 
limi^QX{s,l) = —oo while 

. M lo \ 1 /g(i - g) - 6 rr ( ii 

(73) -4-^V^^/|exp^ ^0 



which is negative, using the upper bound in the classical inequality (|37|) . 



Let us now study the sign of x('S,/). As in the previous case, we first study the sign of — 



namely the sign of P. We distinguish between two cases. 

If s(l — s) > 5 — \/24, then A(s) < 0, so that Q{s,y) > for y < y^. This implies that 
P{s,y) > for y < yo. Therefore, in view of (172]), ^{s,l) > for / < Iq. Thus, in this 
case, I I—)- x(s,0 is increasing from to Iq, going from — oo to x('Si^o) which is negative. In 
conclusion, / i— )• xi^, is negative on (0, Iq), and / i— )• -Fi(s, /) admits a unique global maximum. 

Now, if s(l — s) < 5 — -v/24, A(s) > 0, so that y i— )• Q{s,y) has two roots y+ > y_. We 
recall that j/_ < yo *^==^ ■^(l ~ s) < ^ and notice that ^ < 5 — \/24. Let us thus distinguish 
between two subcases. 

If s(l — s) G [jq, 5 — \/24) ) then < yo < y^ < yj^. The polynomial y i— t- -P(s, y) changes its 
sign at each of its roots {yo; 2/-; and liniy-s>oo P{s-, y) = — oo. Thus, in this case, / i— )• x(s, /) 
is increasing from to Iq, going from — oo to xi^i^o) which is negative. In conclusion, xi^i^) 
is negative, and / i— t- Fi{s, I) admits a unique global maximum. 

The last subcase to consider is s(l — s) < which is equivalent to 

s £ (0,So)U(si,l) 



with 




so = ^|l-A/^j andsi = Ml + y^ 



In this case, < y_ < yo < 2/+- Indeed (using the fact that s < 1), 

y_>0 ^ (1 + S(l - S)) > {s\l - S f - 10S(1 - S) + 1)^/^ 

s(l - s) > 0, 

which is true. The polynomial y i— t- P{s,y) changes its sign at each of its roots {y-,yQ,y^}, 
and limy oo P(s, y) = — oo. Let us denote /_ = y/y^. Thus, in this case, I i— )■ xi^i^) is 
increasing from to /_ (going from — oo to xis,l-)) and then decreasing from to Iq (going 
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from x(s, 1-) to ^o)) which is negative). Thus, if x(s, 1-) < 0, then xi^, I) is negative, and 
I (— >• Fi{s, I) admits a unique global maximum. 

In conclusion, / i— )• Fi (s, I) admits at least one local maximum and at most two local maxima. 
The function / i— )• Fi{s, I) admits two local maxima < I2 if and only if ^(•s, /-) > 0, in which 
case l\<l-< II, and ^{s,l\) = ^{sA) = 0- 

A. 2.1. The case s £ (0,So). Let us assume the existence of s G (0,So) such that / 1— Fi{s,l) 
admits two local maxima li{s) < l2{§.) a^nd let us show that 

f) F FP' F 

(74) 3(s,0 e U,so] xM;, -Q^{s,l) = -Qj^is,l) = 0. 

If ^gp^{s,li{s)) = or ^Q^{s,l2{s)) = 0, we are done. Otherwise, we may apply the implicit 
function theorem to construct for i G {1,2} a continuous curve /*(s) on a maximal interval 
[s,Si) with Si > s such that for s € [s,Si), ^(s,^*(s)) = and ^p^(s,^^(s)) < 0. In 
case min(si,S2) > sq, then, since by the uniqueness part of the implicit function theorem, 
Vs S [s, min(si, S2)), ^i(s) < ^2('^)' contradict the fact that / 1— )• Fi(so,l) admits a unique 
local maximum. Thus, choosing i G {1,2} such that Si = min(si,S2), one has Si < sq. Since 

l*{s) < lo{s) = ^ we may find an increasing sequence (s„)„6N of elements of [s, Sj) 
converging to Sj and such that l*{sn) converges to some limit denoted by l*{si) as n — t- 00. By 
continuity of ^{s,l) and ^{s,l), one has ^{si,l*{si)) = and ^{si,l*{si)) < 0. Let 
us now consider l'^_j^{si), defined as the limit of a converging subsequence of {l^_i{sn))n in case 
S3_j = Si. If lii'Si) = ^i'Si), then from the existence of a local minimum / G {l*{sn),l2i^n)) 
such that ^g^{sn,l) > 0, we conclude that /*(si)) = 0. If li(si) < ^Jl^*) ^^d both 

^Q^{si,li{si)) and ^2(^«)) negative, then, using the implicit function theorem, we 

contradict the maximality of Sj. This concludes the proof of ()74p . 

Let us consider a point (s,/) such that ^^(s,l) = ^q^{s,1) = 0, where s E [0, sq] U [si, 1] 
and P < From ^{s,l) = 0, we get: 

Fi{s,l) = - ^ ,L r (-\/— exp ( ] +1^ 



l^{l-s) y y TT ^ V 8s J \ 2^, 
From ^Qp-{s,l) = 0, which implies x('5,0 — (since ^^{s,l) = 0), we get: 

(75) ^ -ir7= = 7 ,2 . „ . xhz^^P ' 



2^sj I /2(l -s) - 4V 27r ^ V 8s 
By combining these two relations, we have 

]^ I /2s 




Finally, using the expression for Fi{s,l), we get: 
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Using again (|75p . this yields 

exp -— +(l-2s)exp ^- — ^ U - /^s 



i{i-s) rv (I 



which imphes 

(76) (1 - 2.)* - = -j^p^J-^^-exp (--(1 - 2.y^ 

We notice that the right-hand-side is negative, so that this equation has no solution if 
1 — 2s > 0, which leads to a contradiction with (|74|) in the case s G [0, sq]- In conclusion, in 
the case s £ (0,So), I 1— )■ Fi{s,l) admits only one local maximum at point l*{s), which is also 
a global maximum. 

A. 2. 2. The case s G (si, 1). In the case s £ (si, 1), we need another argument. 

Lemma 7. Let us consider s G (-Si,!) and I £ [0,Iq{s)] such that ^^(s,l) = ^q^{s,1) = 0. 
Then, I < l^{s). 

Proof. We know from the previous computations that {s,l) satisfies ()76p . Using the lower 
bound in the classical inequality ()37p . we get 

^ , A ^( K2s-1)\ / l\2s-l) 



;2 



2^s " V 2^s 1 + V 8s ^ 



Prom (I76p . we thus obtain (since 1 — 2s < 0) 
2P{l-s)-3 rV_f X ^ f P{2s-1) 



exp --(1 - 2s)^ < (1 - 2s) ji^-Twexp 



/4_;2(i_5)Y27r 8s^ -"//-v^ P(2.-i)2 \ 85 / 

which implies 

P{l-s)-3 P{2s-1)^ 



< 



4-P(l-s) 4s + P(2s-l)2 
and then (since /^(l — < 2), 

- s) - 3)(4s + l\2s - if) < -f{2s - 1)2(4 - f{l - s)). 

This implies that 

f < 12s. 

On the other hand, it is easy to check that 

(l-f > 12s. 

Indeed 

{l-f > 12s ^ 2 (^— + s]- (s2(l - sf - 10s(l - s) + 1)^/^ > 12s 

\1 — s / |1 — s| 

^ 1 - 5s(l - s) > (s2(l - sf - 10s(l - s) + 1)^/^ 

1 - 10s(l - s) + 25s2(l - sf > s^{l - sf - 10s(l - s) 1 

which is obviously true. Thus ()76p implies I < 1-. 

Let us now assume the existence of s G (^ijl) such that I 1— )■ Fi{s,l) admits two local 
maxima li{s) < ^2(1)- recall that necessarily, x{§.:^-{£)) ^ and li{s) < /-(s) < /2(l)- 
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Lemma [9] below shows that ^x('S;'-(s)) > for s E (si,l). This imphes that Vs G (^)l]) 
x(s,l_(s)) > 0. Using the imphcit function theorem, we can construct, a continuous curve 
^2(5) on a maximum interval of the form s G [s,s) with s > s such that for s £ [s,s), 
^§^is,l2{s)) = 0, ZgC^)) < and thus xis,ms)) < 0. Due to the respective signs of 

the continuous function xi^,^) on the two continuous curves s 1— t- l2is) and s 1— t- l-{s), these 
curves cannot intersect on [s, min(s, 1)). Therefore, Vs G [s, min(s, 1)), ^is) > ^-(s)- We now 
distinguish between three cases. 

If s > 1, then > /_(!) = ^/T2 whereas = and ^(1,/^(1)) < so that 

we contradict (1691). 



If s < 1, then since 12(3) < lo{s) = y we may find an increasing sequence 
of elements of [s,s) converging to s and such that l2isn) converges to some limit denoted by 
^2(5) and which belongs to Zo(s)]. By continuity of ^^{s,l) and ^q^{s,1), one has 

= 0, < and thus x(s,/2(^)) < 0- This implies that q{s) > 

since x(s, > 0. In turn, this implies, by Lemma [71 that (5,^2(5)) < 0. Combining 

the implicit function theorem with the uniqueness of local maxima of I 1— )• Fi{s, I) for / > /-(s), 
we contradict the maximality of s. 

Let us finally consider the case s = 1. We are going to check that ^^{s,l) is negative 
for I large uniformly in s G (^,1) (see Lemma [8|) so that ^2(*) remains bounded in the limit 
s — >• 1. This implies that we may find an increasing sequence (sn)nGN of elements of [s, 1) 
converging to 1 and such that l2{sn) converges to some limit denoted by /2(1) — ^-(1) = 
By continuity of ^{s,l) and ^{s,l), one has ^(1,/2(1)) = and ^(1, /$(!)) < but 
this contradicts (|69p , and concludes the proof of Lemma |3l 



Lemma 8. There exists L > and a < such that, for all I > L and for all s G (si, 1), 
dl 



Proof. Let s G (si, 1). By ()7ip and nonnegativity of Fi, one has 




l-s \ 2' V 2 J J \ 2^ 



2/(2s-l) fP{s-l)\ [trs [ dx ( P 



exp / exp exp 



l-s V 2 J J i(2j-i) \ 2 J ^J2tx V2tt \ 8s 

Using two integrations by parts, one obtains 

^ , 8s3/2 \ / P{2s-iy\ (2J-S 8s3/2 , , . 
e 2dx>\ TT^e^ - -.N. exp " ^ 1^ exp -- 



^(2.-1) - I /(2s - 1) P(2s - 1)3 / \ 8s J \ I P V 8s 



and 
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with the term O (^j^) uniform in s G (si, 1). Using the fact that 

' 1 + -(1 - s) + (1 - 2s)exp ' ^ ' 



l-s \ 2' ' ^ V 2 

,3 + 4,expfQ£^VT^fl-exp 



we get, since s < 1, 



0< fl + ?(l-s) + (l-2s)exp f^^^V^l 1 <2«3 + 4/. 



l-s \ 2 

Thus we get: 



+ 4lexp^^^j+^(^l-exp^- 



<(Z3 + 4lexp(-^]+^(l-exp(^^^U^-^ + e;(. 



2l{2s-l) ( 2^s 8s3/2 2/(2g- 1) ( f{s-l) \ ( 2^s Ss^/^^ ^ 

H ; 6xp I - ) I ; zt y s 



l-s \^z(2s-i) z3(2s-i)3y l-s "^y 2 jy I 

- , (^) . (: - (^)) - (^) ) 



l-s |2(2s- 1)2(1 _s) 2 y l-s 2 y /2(l_s) 



Therefore, one concludes that 



;^(s-l)^ 16s3/^4s(l - s) /I 

^2(25 - 1)2(1 -s)^V^2'- 



which indeed shows that ^^(s, is negative for I large uniformly in s G (si, 1). 

To conclude the proof, we need to prove the following lemma which has been used above. 
Lemma 9. The function s i->- ^x(*5 ^-(s)) positive for s G (si, 1). 

Proof. Let us consider the derivative ^x('S)L(s)). Using the fact that ^(s,/_(s)) = 0, we 
obtain that 

ix{s,l-{s)) = ^{S,l-{S)) = ;^exp (-^) (^2_,p!4)2,3/2;J(^) 

where 

as) = ^-^^-^'7^"'^' + 2/^s^ - + I) (^-(1 - ^) - - - 4) 
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where, here and in the fohowing, /_ should be understood as l-{s). Notice that ^x(s,/-(s)) 
has the same sign as £,{s). By simple computations, we get: 

i{s) = — ^ + 2f_s^ -[- + — ]{lt{l-sf- 10/2 (1 -s) + 24) 



= '''^ - -lt{l -s) + l'i+ 2/2 _ - sf + 5s/2 (1 -s)- 12s. 

8 4 2 

By using the fact that Q{s,l'^) = 0, namely /i = 4 {^\^ + s^ /2 — to rewrite the term 
proportional to /^, we obtain 

^{s) = -s/2 (1 - s) - h.t{l - S) + /2 + 2/2 s2 - 12s 



SO that, using again /i = 4 \jzr^ + sj /2 — to rewrite the term proportional to /I, 

i{s) = 2s/2 (2s - 1) 

which is positive for s E (si, 1). This concludes the proof. ^ 
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